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I.  "MOTION  COHERENCE’ 


A.  The  Experimental  Findings 

To  find  evidence  for  the  operations  of  a  trajectory  network,  a  small  signal  was  embedded 
within  a  large  field  of  random  noise.  The  signal  was  a  single  dot,  moving  in  apparent  motion  in  a 
consistent  direction,  amidst  identical  noise  dots  in  random  apparent  motion.  Casco,  Morgan  and 
Ward  (1989)  had  measured  "djnax  "  ^  single  dot  presented  in  the  midst  of  dynamic  visual  noise, 

but  in  their  study,  the  motion  of  the  signal  dot  was  quite  different  from  that  of  the  flickering  noise 
dotsl  .  In  our  paradigm,the  spatial  (step  size)  and  temporal  (frame  rate)  characteristics  of  the 
motion  were  the  same  for  both  the  signal  and  noise  dots.  Thus,  it  was  impossible  to  discriminate 
between  the  signal  and  noise  dots  on  the  basis  of  any  pair  of  frames.  Only  the  consistency  of  its 
direction  distinguished  the  motion  of  the  signal  dot  from  that  of  the  noise  dots. 


Figure  1 

A  schematic  representation  of  the  display  is  shown  in  Figure  1.  The  noise  dots  were 
presented  within  a  10  degree  circular  aperture.  The  signal  dot  moved  in  one  of  eight  directions 
chosen  at  random,  as  indicated  by  dashed  arrows  drawn  around  perimeter  of  the  circle  in  Figure  1. 
Observers  were  instructed  to  fixate  the  center  of  the  aperture  (shown  by  the  x  in  the  center).  The 
middle  of  the  trajectory  was  constrained  to  be  within  a  2-by-2  degree  window  centered  on  the 

1  "Djnax "  is  the  maximum  displacement  in  a  single  jump  (2  frames)  in  which  the  direction  of  the 
displacement  can  be  reliably  judged  according  to  a  standard  threshold  criterion. 


fixation  point,  but  the  vertical  and  horizontal  location  of  the  trajectory  was  changed  at  random  fi-om 
trial-to-trial.  Hence,  it  seldom  passed  directly  under  the  fixation  point.  In  a  2AFC  paradigm, 
observers  were  shown  two  presentations,  one  containing  only  the  noise  dots  and  the  other  the 
noise  plus  the  signal  dot;  they  judged  which  of  the  two  presentations  contained  the  signal. 
Feedback  was  provided  in  all  experiments 

Our  results  led  to  three  conclusions: 

1 }  Detection  is  limited  by  the  probability  of  mismatch,  not  noise  density. 

Detection  was  measured  as  a  function  of  noise  density  with  step  size  as  a  parameter.  As 
shown  in  the  upper  half  of  Figure  2,  the  effect  of  noise  density  depended  systematically  on  the  step 
size.  At  large  step  sizes,  even  low  density  noise  (0.5  dots/deg2)  degraded  detection,  while  at  small 
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step  size  sizes,  low  density  noise  had  almost  no  effect.  Recall  that  both  the  signal  and  the  noise 
dots  were  moving  in  apparent  (sampled)  motion.  In  the  context  of  apparent  motion,  one  quantity 
that  depends  jointly  on  step  size  and  density  is  the  probability  of  a  mismatch.  Assuming  nearest 
neighbor  matching,  the  probability  of  a  mismatch  is  given  by  the  following  equation  : 

Probability  of  mismatch  =  1  -  e'  ^ 

where  d  is  the  density  of  the  noise  in  dots/deg2  and  h  is  the  step  size  in  degrees  (UUman,  1979; 
Williams  and  Sekuler,  1984).  The  mean  number  of  dots  in  a  circular  area  of  radius  equal  to  the 
step  size  is  (tt  •  h?  *  d)  ,  which  is  equivalent  to  the  mean  number  of  noise  dots  being  closer  to 
the  position  of  the  signal  dot  in  frame  n  than  the  signal  dot  itself  in  frame  n  +  1.  If  this  number  is 
sufficiently  small,  then  the  probability  that  no  noise  dot  causes  a  mismatch  follows  a  Poisson 
distribution  and  thus  equals 

When  the  detection  data  were  plotted  as  a  function  of  the  probability  of  mismatch,  all  the 
curves  for  the  different  step  sizes  were  superimposed  (lower  half  of  Figure  2).  Does  this  result 
mean  that  the  human  motion  system  is  actually  solving  the  motion  correspondence  problem  based 
on  nearest  neighbor  matching?  Not  necessarily.  Rather,  the  superimposed  data  in  the  lower  half 
of  Figure  2  may  mean  that  the  process  that  limits  detection  is  automatically  scaled  with  step  size. 
The  step  sizes  of  the  signal  and  noise  dots  were  kept  the  same  in  order  to  obscure  the  motion  of  the 
signal  dot,  so  our  displays  were  self-similar.  Instead  of  varying  step  size  and  density,  we  could 
have  achieved  the  same  results  by  increasing  the  viewing  distance  from  the  display.  As  our  next 
result  will  show,  mismatching,  based  strictly  on  nearest  neighbor  matching,  is  not  the  factor 
limiting  signal  detection. 

2 )  Trajectory  detection  depends  motion  cues,  not  position  cues 

Since  the  trajectory  motion  traced  out  a  straight  string  of  dots  over  time,  it  was  possible 
that  observers  were  detecting  the  signal  on  the  basis  of  dot  collinearity,  i.e.,on  the  basis  of  a 
position  or  form  cue  rather  than  on  a  motion  cue  (Uttal,  Brunnell  and  Corwin,  1970;  Falzett  and 
Lappin,  1983).  To  test  whether  this  was  the  case,  we  altered  the  directional  characteristics  of  noise. 
Rather  than  choosing  the  directions  of  the  noise  vectors  from  all  360  degrees,  we  narrowed  the 
range  to  180  degrees  centered  on  the  rightward  direction.  This  change  made  the  noise  appear  to 
flow  globally  towards  the  right  (Williams  and  Sekuler,  1984).  As  before,  the  signal  dot  was 
moved  in  one  of  eight  directions  chosen  at  random,  but  for  this  experiment,  we  scored  the 
percentage  correct  as  a  function  of  the  direction  of  the  signal  dot.  Figure  3  shows  that  detection 
depends  strongly  on  the  direction  of  the  signal  dot  relative  to  the  directional  range  of  the  noise. 
When  the  signal  dot  moves  to  the  left,  it  is  almost  perfectly  detected,  but  when  it  moves  to  the 
right,  in  the  mean  direction  of  the  noise,  it  is  much  harder  to  detect.  If  observers  are  detecting  the 
signal  dot  on  the  basis  of  a  position  or  form  cue,  detection  should  not  depend  on  the  mean  direction 
of  the  random  motion  noise;  only  a  motion  signal  would  be  affected  in  the  manner  shown  in  Figure 
3.  These  results  argue  that  signal  detection  depends  on  a  motion  unit  responsive  only  to  a  small 
range  of  directions.  Note  further  that  probability  of  mismatch,which  depends  only  on  step  size  and 
density,  is  the  same  whether  the  signal  dot  moves  left  or  right,  so  mismatching,  based  on  nearest 
neighbor  matching,  does  not  explain  detection  in  this  experiment.  The  dependence  on  probability 
of  mismatch  shown  in  Figure  2  implies  that  the  effect  of  the  noise  scales  with  step  size,  while  the 
results  in  Figure  3  show  that  only  noise  in  the  direction  of  the  trajectory  signal  affects  detection. 
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Figure  3 

3)  Detection  is  Mi  limited  by  the  response  of  single  independent  motion  detectors 

One  explanation  for  our  results  is  that  the  size  of  a  directionally- sensitive  motion  detector 
responding  to  the  trajectory  scales  with  step  size.  Since  the  frame  rate  was  the  same  for  all  our 
displays,  the  speed  of  both  the  signal  and  noise  dots  increased  systematically  with  step  size. 
Motion  units  responsive  to  different  speeds  could  have  different-sized  receptive  fields.  The 
temporal  integration  time  associated  with  primaiy  motion  detectors  is  roughly  100  msec  (Mikami, 
Newsome  &  Wurtz,  1986;  Burr,  1981;  Watson  and  Nachmias,  1977).  If  all  motion  detectors  have 
about  the  same  integration  time,  then  slow  speeds  (or  small  step  sizes)  are  detected  by  motion  units 
with  small  receptive  fields,  and  fast  speeds  (or  large  step  sizes)  by  units  with  large  receptive  fields 
(McKee  and  Nakayama,  1984).  As  shown  by  the  diagram  in  Figure  4,  the  interaction  between  step 
size  and  density,  i.e., "probability  of  mismatch",  is  predictable  by  assuming  that  the  receptive  field 
of  the  motion  detector  scales  with  step  size.  If  detection  is  limited  by  the  quantity  of  noise  falling 
within  a  single  motion  detector,  then  increasing  the  noise  density  for  small  step  sizes  guarantees 
that  the  same  amount  of  noise  falls  within  small  receptive  fields  responding  to  small  step  sizes  as 
within  larger  field  sizes  responding  to  the  larger  step  sizes.  Of  course,  only  the  noise  vectors  that 
stimulate  the  preferred  direction  of  the  unit  responding  to  the  trajectory  will  affect  detection,  but  the 
quantity  of  directional  noise  will  also  remain  constant  if  the  field  size  of  the  motion  detector  is 


scaled  with  step  size. 

Our  next  experiment  showed  that  this  scaled-receptive-field  explanation,  however 
attractive,  was  not  sufficient  to  account  for  trajectory  detection.  The  signal  trajectory  did  not  have 
to  be  straight  to  be  easily  detected.  We  had  measured  the  tolerance  for  slow  changes  in  direction 
from  frame-to-frame.  Provided  the  change  in  direction  was  small  (<45  deg),  detection  was  fairly 
high  (75%  correct)  for  signal  dots  presented  in  dense  noise.  This  result  suggested  that  circular 
trajectories  would  also  be  quite  visible,  and  indeed,  we  found  that  a  circular  trajectory  was  about  as 
detectable  as  a  straight  one.  Note  what  happens  to  the  signal  falling  within  a  single  motion  unit  if 
the  trajectory  is  slowly  changing  direction  each  frame  (24  degree  shift  in  direction  each  frame).  By 


Figure  4 

the  end  of  five  frames  (100  msec),  the  trajectory  is  moving  orthogonal  to  its  initial  direction  (see 
Figure  5).  Obviously,  this  is  not  a  good  signal  for  a  unit  tuned  to  the  initial  direction.  Nor  is  it  a 
particularly  good  signal  for  a  unit  tuned  to  an  oblique  direction,  one  that  might  respond  to  the  chord 
of  the  arc  formed  by  the  circular  trajectory  (the  first  and  fifth  dots  of  the  sequence)  since  such  units 


would  see  less  of  the  signal.  Yet  human  performance  is  the  same  for  the  circular  and  straight 
trajectories.  This  result  argues  for  some  mechanism  that  joins  signals  from  adjacent  units  with 
similar  directional  tuning. 


Figure  5 

Even  more  powerful  evidence  against  this  single-motion-unit  hypothesis  comes  from 
consideration  of  the  noise  falling  within  each  putative  motion  unit.  Although  the  noise  filling  the 
screen  is  impressive,  the  noise  associated  with  each  motion  unit  is  very  small.  For  example,  if  the 
diameter  of  the  responding  motion  unit  corresponds  to  the  distance  moved  by  the  trajectory  in  five 
frames  (four  hops),  then,  for  a  step  size  of  0.16  deg.,  the  receptive  field  has  an  area  of  0.32  deg2. 
At  a  noise  density  of  3  dots/deg2,  about  1  noise  dot  falls  in  this  unit  every  frame,  or  5  noise  dots  in 
100  msec  —  the  same  number  as  the  signal.  The  signal-to-noise  ratio  is  even  higher  than  this  value 
of  1,  since  most  of  the  motion  vectors  specified  by  the  noise  dots  are  apt  to  be  orthogonal  or 
opposite  to  the  preferred  direction  of  the  unit.  Therefore,  considering  that  Newsome,  Britten  and 
Movshon  (1989)  reported  that  a  correlated  directional  signal  of  10%  is  detectable  in  noise  both  by 
human  observers  and  single  units  in  MT,  a  single- motion-unit  hypothesis  would  predict  almost  no 
effect  of  noise  on  performance.  But,  as  shown  in  Figure  2,  this  amount  of  noise  does  greatly 
degrade  detection. 

To  make  this  argument  more  quantitative,  Grzywacz,  Watamaniuk  and  McKee  (see 


enclosed  paper)  simulated  the  response  of  Gabor  motion  energy  units  to  6  frames  (120  msec)  of  a 
signal  dot  moving  on  a  straight  trajectory  through  random-motion  noise.  Figure  6  plots  a 
estimate  of  the  relative  response  of  these  units  to  targets  moving  in  their  preferred  and  null 
directions^: 

2  Rp  -  Rfi  {Rp  ~  Rn)^ 

where  Rp  and  Rn  are  preferred  and  null  responses  respectively,  the  second  fraction  is  a 
measure  around  the  mean,  (Rp  +R  „  j/2  ,  and  the  first  fraction  gives  the  sign  of  Rp  -R  n  ■  The 

response  is  plotted  as  a  function  of  the  receptive  field  radius^  in  units  of  step  size.  For  units  with 
larger  receptive  fields,  directional  selectivity  is  close  to  zero  due  to  the  effect  of  the  noise.  The  best 
signal  is  found  at  a  receptive  field  radius  of  2.3  steps,  or  the  diameter  traversed  in  about  5  frames. 


Receptive  Field  Radius  (step  size)  Receptive  Field  Radius  (step  size) 


Figure  6 

^  This  statistic  was  used  rather  than  the  simple  ratio  of  preferred  to  null  responses  because  X2  is 
sensitive  to  absolute  response  amplitude,  and  therefore  captures  the  statistical  significance  of  the 
ratio. 

^  As  given  by  the  standard  deviations  of  the  Gaussian  envelopes  of  the  Gabor  filters  used  for  the 
simulations. 


A  secondary  positive  peak  is  found  at  a  radius  equal  to  the  step  size.  The  negative  response  at  a 
radius  of  1.5  steps  indicates  aliasing,  i.e.,  the  motion  unit  responds  more  vigorously  to  a  signd  in 
the  null  direction  than  to  one  in  the  preferred  direction.  The  two  adjacent  graphs  show  simulations 
for  two  different  probabilities  of  mismatch  (0.3  and  0.5),  which,  for  any  given  step  size, 
correspond  to  two  different  noise  densities.  Note  that  the  magnitude  of  the  peak  response  is  barely 
affected  by  increasing  the  noise  density,  although  human  detection  is  near  chance  for  a  probability 
of  mismatch  of  0.5.  So,  if  human  motion  units  resemble  the  units  in  the  motion  energy  model, 
noise  in  single  independent  motion  units  cannot  account  for  our  results. 

If  we  were  to  place  a  small  circular  aperture  on  the  screen  so  that  five  frames  of  the 
trajectory  moved  through  the  aperture  on  every  trial,  the  observer  would  always  detect  the  signal, 
unless  the  noise  was  so  dense  that  it  virtually  filled  the  aperture.  Why  is  the  signal  dot  so  difficult 
to  detect  in  the  sparse  noise  of  our  much  larger  displays?  One  possibility  is  that  the  observer  is 
engaged  in  some  type  of  search  task,  and  that  the  signals  produced  by  the  noise  act  as  "distractors" 
(Treisman  and  Gelade,  1980).  The  observer  is  forced  to  go  through  every  small  area  in  the  display 
looking  for  a  motion  signal  that  is  significantly  larger  than  the  signals  produced  by  the  noise.  A 
higher  noise  density  means  more  distractors,  so  if  the  stimulus  duration  is  short,  the  observer  does 
not  have  time  to  find  the  signal,  and  gets  only  a  small  percentage  correct.  Of  course,  the  trajectory 
signal  does  not  remain  conveniently  in  a  single  location,  but,  for  some  search  strategies,  there 
could  be  as  much  chance  of  encountering  a  moving  signal  as  of  finding  a  five-frame  signal 
presented  repeatedly  in  one  place. 

In  this  view,  there  is  nothing  special  about  the  motion  trajectory,  so  no  special  network  is 
required  to  detect  it.  The  trajectory  is  just  an  efficient  way  of  increasing  the  signal  in  several  local 
motion  detectors  above  the  signals  produced  by  the  distracter  dots  in  other  units.  For  example,  if 
five  frames  of  sampled  motion  (100  msec)  are  the  number  detected  by  the  receptive  field  of  a  local 
motion  unit,  then  a  straight  trajectory  of  10  frames  produces  six  sequences  of  five  frames  each.  If 
the  observer  is  searching  for  any  unit  with  a  strong  motion  signal,  he  or  she  has  a  high  chance  of 
finding  at  least  one  of  these  six  units.  This  view  cannot  be  correct  because  we  have  shown  that  the 
extended  sequential  arrangement  of  trajectory  motion  is  important  to  detection.  In  one  experiment, 
we  distributed,  throughout  the  noise,  six  five-frame  segments  equivalent  to  the  number  of 
segments  that  form  a  ten-frame  straight  trajectory;  the  trajectory  was  more  easily  detected  than  the 
distributed  components.  Also,  our  results  with  the  circular  trajectory  show  that  detection  is 
relatively  easy  when  none  of  the  local  motion  units  have  a  signal  that  is  much  stronger  than  the 
signals  produced  by  the  motion  of  the  noise. 

The  observed  superiority  of  the  trajectory  is  not  the  only  problem  with  a  simple  search 
explanation.  Why  does  success  in  finding  the  trajectory  dot  depend  on  probability  of  mismatch? 
Why  is  the  trajectory  dot  always  visible  in  the  midst  of  a  hundred  noise  dots  under  some  conditions 
(small  step  sizes),  but  completely  hidden  in  others  (large  step  sizes)?  To  account  for  this  finding, 
the  search  area  would  have  to  scale  inversely  with  speed.  Some  process  would  have  to  reduce  the 
size  of  the  search  area  to  a  small  region  when  the  speed  is  very  slow,  and  increase  it  to  include  the 
whole  field  when  the  speed  is  very  fast,  thereby  keeping  the  effective  number  of  "distractors" 
constant.  It  has  been  suggested  that  the  size  of  the  window  of  attention  automatically  scales  with 
the  size  or  scale  of  the  searched-for  object  (Sperling  and  Melcher,  1978;  Nakayama,  1990),  but,  to 
our  knowledge,  no  one  has  suggested  an  automatic  scaling  with  speed. 

Cognitive  processes  such  as  selective  attention  undoubtedly  define  the  region  over  which 
the  brain  searches  for  the  signal,  but  our  data  suggest  that  the  effects  of  noise  on  detection  are 


mediated  by  the  human  motion  system  (Figure  3).  Since  the  noise  does  not  affect  the  primary 
motion  units  directly,  we  think  that  the  constraints  on  signal  detection  are  set  by  neural  operations 
that  occur  in  the  motion  system  after  initial  processing  by  the  primary  motion  detectors.  These 
operations,  in  turn,  make  it  difficult  to  find  a  motion  unit  with  a  signal  greater  than  that  produced 
by  the  noise,  no  matter  what  search  strategy  is  adopted  by  the  brain.  We  next  describe  a 
computational  model  of  these  operations  and  show  that  it  simulates  our  experimental  results 
quantitatively. 

B.  Computational  Results 

The  velocity  of  a  moving  object,  i.e.,  any  visible  feamre  with  contours  at  various 
orientations,  cannot  typically  be  specified  by  a  single  motion  detector,  since  each  detector  responds 
preferentially  to  that  component  of  the  motion  perpendicular  to  its  preferred  orientation.  Signals 
from  motion  units,  responding  differentially  to  the  various  object  contours,  must  be  combined  in 
some  fashion  to  obtain  a  single  coherent  estimate  of  object  velocity.  Motion  coherence  has 
received  considerable  attention  in  the  context  of  plaid  motion  (Adelson  and  Movshon,  1982;  Welch, 
1989;  Ferrera  and  Wilson,  1990;  Wilson,  1991).  Operations  that  account  for  the  coherence  of  plaid 
motion  must,  of  course,  also  apply  to  any  local  field  of  image  velocity  vectors.  Yuille  and 
Grzywacz  (1988;  1989)  proposed  a  general  computational  theory  that  smoothed  the  velocity  field  by 
minimizing  a  cost  function;  they  showed  that  this  theory,  which  they  called  "Motion  Coherence", 
produced  a  coherent  estimate  of  object  motion  and  could  account  for  a  number  of  psychophysical 
results.  Grzywacz,  Smith  and  Yuille  (1989)  extended  the  model  to  enforce  consistency  over  time 
as  well  as  over  space  ("temporal  coherence"). 

The  current  version  of  this  theory  is  described  fully  in  the  accompanying  paper  (Grzywacz, 
Watamaniuk,  and  McKee).  Briefly,  it  consists  of  three  hierarchical  stages.  The  first,  which  we 
call  the  Local  stage,  estimates  motion  in  localized  regions  of  a  multi-dimensional  space  whose 
independent  variables  are  space,  time,  direction  of  motion,  speed  and  spatial  scale  (or  spatial 
frequency).  We  think  of  the  motion  measurements  at  this  stage  as  being  generated  by  primary 
motion  detectors  -  units  tuned  to  particular  values  of  all  these  variables,  except  for  time.  The 
second  stage,  the  Coherence  stage,  smooths  these  local  measurements  within  neighborhoods  in 
this  multi-dimensional  space.  Consequently,  this  stage  transforms  the  local  multi-dimensional 
space  into  a  new  multi-dimensional  space  defined  by  the  same  independent  variables.  The  final 
stage,  which  we  call  the  Outlier  stage,  implements  a  decision  rule,  so  that  the  results  of  the 
computer  simulations  can  be  compared  to  human  psychophysics.  It  simulates  a  search  within  a 
large  region  of  the  space  defined  by  the  motion  measurements,  but  confined  to  a  fairly  narrow 
range  of  directions.  Using  an  approximation  to  the  Dixon  statistic  often  used  to  remove  outliers 
(Dunn  and  Clark,  1987),  the  computer  determines  if  there  is  a  signal  in  each  directional  bandwidth 
which  is  sufficiently  larger  than  the  rest  of  the  measurements  to  be  considered  an  outlier,  and 
identifies  this  outlier  as  the  signal  dot 

Since  the  Coherence  stage  is  the  most  important  for  explaining  our  results,  we  will  describe 
it  in  some  detail.  Conceptually,  there  are  three  components  to  the  stage.  One  component,  which 
we  call  neighborhood  coherence,  accounts  for  the  influence  of  the  surrounding  noise  on  the 
detection  of  the  signal  dot.  At  any  instant  of  time,  this  component  enforces  consistency  over  a 
neighborhood  within  the  multi-dimensional  space;  in  particular,  it  minimizes  differences  between 
local  units  within  a  small  spatial  region  whose  size  increases  proportionally  to  the  spatial  scale  of 
the  local  units.  In  our  computer  implementation,  smoothing  is  only  enforced  between  adjacent 


neighbors  tuned  to  the  same  spatial  scale  and  to  similar  directions  (bandwidth  =  45  deg).  Consider 
what  happens  if  a  noise  dot  happens  to  stimulate  a  similar  unit  adjacent  to  the  unit  responding  to  the 
trajectory.  In  minimizing  the  ifference  between  the  two  units,  the  smoothing  operation  reduces 
the  signal  from  the  unit  responding  to  the  trajectory  to  near  the  same  level  as  the  unit  responding  to 
the  noise  dot.  As  the  noise  density  is  increased,  the  chance  that  a  neighboring  unit  will  be 
stimulated  by  a  noise  dot  increases,  producing  a  frequent  reduction  in  the  response  to  the  signal  dot 
as  it  moves  along  its  trajectory.  At  very  high  densities,  the  response  to  the  signal  dot  is  seldom 
higher  than  the  response  to  various  noise  dots  within  the  display,  so  the  trajectory  dot  fails  to  meet 
the  statistical  criterion  of  the  Outlier  stage.  As  shown  in  Figure  6,  the  response  of  the  local  units  to 
the  signal  dot  scales  with  step  size,  being  maximal  for  a  unit  with  a  radius  2.3  times  the  step  size. 
In  the  implementation  of  the  model,  to  enforce  the  proportionality  between  the  spatial  scale  of  the 
local-motion  units  and  the  size  of  neighborhood  for  coherence,  the  smoothing  operation  was 
enforced  only  between  adjacent  units  of  the  same  spatial  scale.  Hence,  as  the  step  size  got  larger, 
the  optimal  scale  detecting  the  signal  dot  increased,  and  the  spatial  region  associated  with 
neighborhood  coherence  increased  proportionally.  This  property  explains  why  detection  was 
limited  by  probability  of  mismatch  (Figure  2). 

What  accounts  for  the  high  detection  rate  in  moderately  dense  noise?  The  countervailing 
force  is  the  "temporal  coherence"  component  which  reinforces  directional  consistency  over  time. 
Since  only  the  trajectory  dot  moves  in  a  consistent  direction,  the  signals  of  units  responding  to  the 
trajectory  are  enhanced  relative  to  signals  generated  by  the  noise.  The  immediate  past  information 
from  units  in  the  Coherence  stage  is  compared  to  present  information.  Activity  in  a  directionally- 
selective  unit  implies  a  direction  from  which  the  signal  could  have  come;  the  temporal  coherence 
component  checks  whether  there  was  the  same  or  a  similar  directional  signal  at  a  position 
corresponding  to  the  implicit  location  of  the  motion  signal  in  the  immediate  past.  In  the 
implementation  used  to  model  our  psychophysical  results,  only  neighboring  units  at  the  same  scale 
and  similar  directional  bandwidth  were  compared.  Finally,  although  both  the  neighborhood  and 
temporal  coherence  components  smooth  responses  from  the  primary  motion  units,  the  coherence 
measurements  are  still  constrained  to  agree  with  the  measurements  from  the  Local  stage,  so  a  third 
component  minimizes  differences  between  the  Coherence  units  and  the  primary  motion  units  of  the 
Local  stage. 

A  mathematical  representation  of  the  Coherence  stage  would  find  the  that  minimizes 
the  following  energy  function: 

E{t)=  ^ 
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The  directionally  selective  cells^in  both  the  Local  and  Coherence  stage^can  by  characterized  by  the 
middle  of  the  receptive  fields  (r)  ,  the  preferred  direction  of  motion  (u)  ,  preferred  speed  {s),  and 
spatial  scale  or  receptive  field  size  (A,).  The  activity  of  one  such  cell  in  the  Local  stage  can  be 


written  as  Ri  (t :  r,  u,  5,  A  j  ,  where  t  is  time  and  the  quantities  following  the  colon 
are  parameters  of  the  cell.  Similarly,  the  activity  of  the  corresponding  cell  in  the  Coherence  stage 
can  be  written  as  Rc  (t :  r,u,  s,X  )  .  The  first  term  of  equation  (3)  imposes  consistency  with 
the  local  measurements  from  the  primary  motion  units,  since  it  minimizes  the  differences  between 
Rc  and  Ri  .  The  second  term  smooths  the  responses  across  neighborhoods,  by  reducing  the 
magnitudes  of  differentiations  of  i?c  •  Tfrese  Dj;  operators  would  normally  be 
partial  derivatives,  gradients  or  more  general  operators  using  a  combination  of  derivatives  of 
various  orders  (Yuille  and  Grzywacz,  1989).  However,  these  operators  are  also  functions  of  the 
local  data,  and  are  constrained  in  two  important  ways.  First,  if  the  local  data  (Ri )  change  too 
quickly  in  any  direction  in  the  multi-dimensional  space,  then  data  around  that  point  are  not  used  in 
the  derivative  estimate.  Secondly,  the  spatial  operator  is  a  function  of  X,,  that  is.  Dr  =  D  r 
(X,R  I ). 

This  constrains  the  size  of  the  spatial  neighborhood  to  be  proportional  to  its  spatial  scale.  One  way 
to  implement  this  requirement  is  to  keep  the  inter-neighbor  spatial  distance  proportional  to  X  and 
to  estimate  derivatives  from  neighbor  responses.  Another  way  is  to  define  Dr  as  an  operator 
using  a  combination  of  derivatives  of  various  orders,  which  implements  smoothing  with  a 
Gaussian  approximation,  whose  standard  deviation  can  be  made  proportional  to  X  (Yuille  and 
Grzywacz,  1989). 

The  bottom  integrals  impose  temporal  coherence  by  using  an  integrand  that  is  reminiscent 
of  the  "image  constraint  equation"  of  Fennema  and  Thompson  (1979)  and  Horn  and  Schunck 
(1981).  The  main  difference  between  their  formulation  and  that  in  Equation  3  is  that  the  new 
equation  uses  spatio-temporal  derivatives  of  the  response  instead  of  brightness.  In  a  sense,  this 
equation  measures  the  movement  of  the  local  response,  rather  than  the  movement  of  the  image. 
The  other  difference  is  that  Equation  3  eliminates  speed  (s')  dependence  through  integration  with 
the  weight  (probability  density)  function  W  (s').  To  obtain  the  bottom  integrand  (except  for  W), 
one  posmlates  that  there  is  a  minimal  difference  between  the  responses  of  a  cell  "A"  with  particular 
preferred  direction  and  the  past  responses  of  other  cells  with  the  same  preferred  direction  and 
whose  receptive  fields  are  located  in  positions  for  which  the  preferred  direction  point  to  the 
receptive  field  of  "A".  Mathematically,  this  difference  can  be  approximated  as: 

Rc(t :  r,u,s,X}  -  Rci^  -  At:  r- s’  At  u,u,s,X)  =  -  j'  VrRc  *  “ 

To  test  whether  this  computational  model  could  reproduce  our  results,  we  used  a  simplified 
implementation  to  simulate  the  conditions  of  our  experiments  (see  enclosed  paper  for  details  of  the 
computational  methods).  Two  parameters,  the  value  of  the  detection  threshold  for  the  Outlier  stage 
and  the  standard  deviation  of  an  internal  noise,  were  estimated  from  the  subjects'  performance  in 
one  standard  experimental  condition  —  a  straight  260  msec  trajectory  signal  presented  at  a 
probability  of  mismatch  of  0.3. 

Two  results  from  this  simulation  are  described  next.  First,  it  was  shown  that  the  computer 
simulation  could  mimic  the  decline  in  performance  with  increasing  probability  of  mismatch  for  a 
wide  range  of  step  sizes.  The  straight  line  in  Figure  7  shows  the  predicted  performance  from  the 
computer  simulation;  data  from  the  subjects  are  shown  for  comparison.  The  model  predictions  are 
in  good  agreement  with  human  performance  except  at  very  low  mismatch  probabilities.  As  a 


second  challenge,  we  simulated  the  response  of  the  model  when  the  directions  of  the  vectors  in  the 
random  motion  noise  were  confined  to  a  bandwidth  of  180  degrees  (see  Figure  3  above). 
Qualitatively,  the  simulation,  shown  by  the  solid  and  dashed  lines  in  Figure  8  below,  resembles  the 
performance  of  the  subject.  The  only  significant  discrepancy  was  the  higher  sensitivity  of  the 
model  when  the  difference  between  the  direction  of  the  signal  and  the  mean  direction  of  the  noise  is 
around  135  deg. 


Probability  of  Mismatch  Probability  of  Mismatch 


Figure  7 

For  computational  simplicity,  the  local  motion  units  that  provided  the  input  to  the  coherence  stage 
were  very  crude.  They  responded  equally  to  any  signal  falling  within  a  45  deg  bandwidth,  but 
gave  no  response  outside  this  range.  This  sharp  tuning,  which  is  improbable  biologically,  may 
account  for  the  discrepancy. 

In  addition,  the  model,  as  implemented,  does  simulate  human  performance  with  circular 
trajectories.  Sequential  activity  in  adjacent  motion  units  tuned  to  the  same  or  similar  directions  is 
enhanced  by  temporal  coherence  relative  to  the  signals  generated  by  the  noise.  The  model 
predictions  show  the  same  improvement  in  performance  with  increasing  duration  as  that  exhibited 


by  the  human  observers. 


Figure  8 

II.  OTHER  PROJECTS 
A.  Motion  Transparency 

Motion  transparency  challenges  some  contemporary  motion  models,  because  the  visual 
system  appears  to  assign  two  different  directions  and/or  speeds  to  the  same  location.  Most 
experimental  studies  of  motion  transparency  have  used  the  familiar  plaid  pattern,  composed  of  two 
superimposed  drifting  gratings  presented  at  different  orientations.  In  some  conditions,  the  drifting 
gratings  cohere  into  a  rigid  structure  moving  in  a  single  direction,  while  in  others,  the  gratings 
appear  to  slide  across  one  another  ("transparency").  Sparse  random  dot  patterns  have  also  been 
used  to  create  the  appearance  of  transparent  surfaces  that  glide  past  each  other  in  opposite 
directions  (Ullman,1984;  Treue,  Husain  and  Anderson,  1991). 

Bravo  and  Watamaniuk  (see  enclosed  paper)  used  a  sparse  random  dot  cinematogram  to 
study  the  motion  transparency  produced  by  differences  in  speed.  In  their  display,  all  dots  moved 
in  the  same  direction  at  one  of  two  speeds.  Two  transparent  surfaces  were  seen  provided  that  1) 
the  difference  between  the  two  speeds  was  substantial  (at  least  a  factor  of  two),  and  2)  that  dots 
moving  at  each  of  the  two  speeds  were  presented  simultaneously.  Rapid  synchronous  alternation 


(every  20  -  40  msec)  between  the  two  speeds  did  not  produce  transparency.  When  this 
segmentation  into  transparent  surfaces  occurred,  observers  could  discriminate  the  speed  of  the  dots 
associated  with  one  surface  without  any  interference  from  the  dots  moving  on  the  other  surface. 
Bravo  and  Watamaniuk  showed  that  the  initial  segmentation  into  two  surfaces  was  based  on  a  very 
imprecise  speed  signal.  After  segmentation,  the  visual  system  integrated  across  each  of  the  two 
transparent  surfaces  to  improve  the  precision  of  the  speed  signal.  The  Smith-Grzywacz 
transparency  model  can  account  for  the  initial  segmentation  based  on  speed,  and  probably  for  the 
subsequent  integration  of  signals  within  planes.  The  Nowlan-Sejnowski  model  (Nowlan  and 
Sejnowski,  1993)  may  also  be  able  to  account  for  these  results. 

To  summarize,  the  results  from  the  study  by  Bravo  and  Watamaniuk  suggest  that  1)  the 
motion  system  calculates  local  velocities;  2)  it  determines  whether  there  is  a  single  candidate 
velocity  or  several  "winners";  3)  it  segments  the  visual  field  into  planes  based  on  the  number  of 
local  velocities;  4)  then  it  integrates  across  the  planes  specified  by  the  local  velocities  to  improve  the 
precision  of  the  speed  signal. 

B.  Motion-in-Depth 

Objects  that  move  in  depth  toward  or  away  from  the  observer's  eyes  produce  image  motions 
on  the  two  retinae  that  differ  in  speed  or  direction  or  both.  As  Regan  (1993)  noted  in  a  recent  short 
communication  in  Vision  Research,  there  are  two  alternative  ways  to  encode  motion-in-depth. 
The  brain  may  use  the  difference  (ratio)  of  the  two  monocular  velocities  (Beverley  and  Regan  1973; 
1975)  or  the  change  in  disparity  over  time.  Cumming  and  Parker  (1994)  created  a  stereoscopic 
stimulus  that  had  no  correlated  motion  in  either  of  the  monocular  half-images.  They  manipulated 
the  disparity  of  the  stimulus  over  time  so  that  the  stimulus  appeared  to  be  moving  in  depth.  The 
smallest  detectable  motion-in-depth  threshold  was  equal  or  better  than  similar  thresholds  measured 
with  stimuli  containing  correlated  motion  in  the  half-images.  Gumming  and  Parker  concluded  that 
there  was  no  neural  mechanism  sensitive  to  inter-ocular  velocity  differences;  motion-in-depth  was 
mediated  by  changes  in  disparity  over  time.  However,  subjects  in  this  study  were  not  required  to 
make  a  pure  motion  judgment,  so  they  may  have  been  disparity  cues  alone.  Hams  and 
Watamaniuk  (see  enclosed  paper)  used  the  Cumming  and  Parker  stimulus  (no  monocularly- 
correlated  motion)  to  measure  speed  discrimination  for  targets  moving  in  depth.  Although  this 
target  appeared  to  move  in  depth,  their  subjects  were  unable  to  judge  target  speed.  Subjects  could 
judge  the  motion-in-depth  speed  precisely  only  when  the  monocular  half-images  contained  motion 
information.  A  series  of  experimental  manipulations  revealed  that  subjects  had  to  compare  the 
speeds  of  the  two  half-images  in  order  to  make  this  judgment.  Thus,  there  must  be  a  neural 
mechanism  that  compares  inter-ocular  velocity  differences. 

C.  Detecting  a  Trajectory  in  Three-Dimensional  Noise 

One  function  of  stereopsis  is  to  break  camouflage.  Is  a  trajectory  moving  in  3-D  through 
three-dimensional  noise  more  easily  detected  than  a  2D  trajectory  presented  in  two-dimensional 
noise  (our  standard  condition)?  We  measured  detection  of  trajectory  motion  as  a  function  of  the 
number  of  noise  dots  filling  a  cylindrical  space  viewed  stereoscopically.  We  compared  3D 
detection  of  the  trajectory  to  a  2D  control  condition  in  which  the  subjects  viewed  one  of  the  half¬ 
images  of  the  stereogram  binocularly.  There  was  a  consistent  improvement  in  detection  in  the  3D 
condition,  but  it  was  surprisingly  small  —  about  25%  at  best.  We  concluded  that  the  neural 
representation  of  three-dimensional  space  is  not  isotropic.  Human  resolution  of  the  third- 


dimension  is  very  coarse,  so  detection  of  features  embedded  in  three-dimensional  noise  is  not  aided 

much  by  stereopsis.  A  manuscript  describing  these  results  is  in  preparation. 
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